Semantic Similarity of Arabic Sentences with Word Embeddings
نویسندگان
چکیده
Semantic textual similarity is the basis of countless applications and plays an important role in diverse areas, such as information retrieval, plagiarism detection, information extraction and machine translation. This article proposes an innovative word embedding-based system devoted to calculate the semantic similarity in Arabic sentences. The main idea is to exploit vectors as word representations in a multidimensional space in order to capture the semantic and syntactic properties of words. IDF weighting and Part-of-Speech tagging are applied on the examined sentences to support the identification of words that are highly descriptive in each sentence. The performance of our proposed system is confirmed through the Pearson correlation between our assigned semantic similarity scores and human judgments.
منابع مشابه
Penn: Using Word Similarities to better Estimate Sentence Similarity
We present the Penn system for SemEval2012 Task 6, computing the degree of semantic equivalence between two sentences. We explore the contributions of different vector models for computing sentence and word similarity: Collobert and Weston embeddings as well as two novel approaches, namely eigenwords and selectors. These embeddings provide different measures of distributional similarity between...
متن کاملEvaluating Multimodal Representations on Sentence Similarity: vSTS, Visual Semantic Textual Similarity Dataset
The success of word representations (embeddings) learned from text has motivated analogous methods to learn representations of longer sequences of text such as sentences, a fundamental step on any task requiring some level of text understanding [13]. Sentence representation is a challenging task that has to consider aspects such as compositionality, phrase similarity, negation, etc. In order to...
متن کاملJoint Unsupervised Learning of Semantic Representation of Words and Roles in Dependency Trees
In this paper, we introduce WoRel, a model that jointly learns word embeddings and a semantic representation of word relations. The model learns from plain text sentences and their dependency parse trees. The word embeddings produced by WoRel outperform Skip-Gram and GloVe in word similarity and syntactical word analogy tasks and have comparable results on word relatedness and semantic word ana...
متن کاملOPI-JSA at SemEval-2017 Task 1: Application of Ensemble learning for computing semantic textual similarity
Semantic Textual Similarity (STS) evaluation assesses the degree to which two parts of texts are similar, based on their semantic evaluation. In this paper, we describe three models submitted to STS SemEval 2017. Given two English parts of a text, each of proposed methods outputs the assessment of their semantic similarity. We propose an approach for computing monolingual semantic textual simil...
متن کاملBilingual Word Embeddings for Phrase-Based Machine Translation
We introduce bilingual word embeddings: semantic embeddings associated across two languages in the context of neural language models. We propose a method to learn bilingual embeddings from a large unlabeled corpus, while utilizing MT word alignments to constrain translational equivalence. The new embeddings significantly out-perform baselines in word semantic similarity. A single semantic simil...
متن کامل